Investigating the Cognitive Attributes... 1 Running Head: USING ATTRIBUTE HIERARCHY METHOD ON A READING TEST Investigating the Cognitive Attributes Underlying Student Performance on a Foreign Language Reading Test: An Application of the Attribute Hierarchy Method

نویسندگان

  • Changjiang Wang
  • Mark J. Gierl
  • Jacqueline P. Leighton
چکیده

Educational assessments are designed to facilitate teaching and learning. To achieve this purpose, the psychology underlying student performance must be well understood because most tests are based on cognitive problem-solving tasks. This calls for the integration of cognitive psychology with educational assessments. The present study illustrates the use of the attribute hierarchy method (AHM) (Leighton, Gierl, & Hunka, 2004) by applying it to a foreign language reading test. The AHM is a psychometric approach which integrates cognitive psychology with educational measurement. The results of the study indicate that, with the AHM, more cognitive diagnostic information could be produced, which can be used for guiding teaching and learning. The results also demonstrated how the AHM can be used to evaluate cognitive theories in a content domain. Investigating the Cognitive Attributes... 3 Investigating the Cognitive Attributes Underlying Student Performance on a Foreign Language Reading Test: An Application of the Attribute Hierarchy Method Introduction Educational assessments are designed to facilitate teaching and learning. However, due to the disjunction between cognitive psychology and educational measurement, most large-scale assessments typically yield very limited information regarding why some students perform poorly or how instructional conditions can be modified to improve teaching and learning (National Research Council, 2001). More research is now being devoted to the integration of cognitive psychology with educational assessment and several models towards this integration have been proposed. These models include, but are not limited to, the rule-space approach (Tatsuoka, 1995), the tree-based regression approach (Sheehan, 1997), the cognitive design system (Embretson, 1998), the evidence-centered design (Mislevy, Steinberg, & Almond, 2002), and the Attribute Hierarchy Method (AHM; Leighton et al., 2004). These models differ in the way cognitive information is used, but they all attempt to identify the cognitive components and processes that affect student performance on tests, resulting in cognitive diagnostic feedback that can be used to guide the teaching and learning process (Gorin, 2002). To date, the first four models have been empirically applied to the testing practice and promising results have been produced (Gorin, 2002). However, as a newly-proposed model with promising features, the AHM has not yet been applied to empirical studies. Thus, the present study illustrates the use of the AHM by using this approach to model and report on problem solving in a foreign language reading test. Investigating the Cognitive Attributes... 4 The Attribute Hierarchy Method The AHM (Leighton et al., 2004; See also Gierl, Leighton, & Hunka, 2000, in press) represents an important variation of Tatsuoka’s rule-space approach because of the assumption that the cognitive attributes are hierarchically related in a model of task performance (Leighton & Gierl, 2006). A cognitive attribute is defined as a description of the procedural or declarative knowledge needed to perform a task in a specific domain (Leighton et al., 2004). In the present study, “attribute” is used as an umbrella term to refer to the cognitive processes and skills employed by students to correctly answer reading comprehension items. The assumption that cognitive attributes are hierarchically related better reflects the characteristics of human cognition because cognitive skills do not operate in isolation but belong to a network of interrelated competencies (Kuhn, 2001). Another strength of the AHM lies in its facility to guide test development. Once the hierarchy of attributes is identified for a domain, test developers can create items according to the hierarchical organization of the attributes. By doing so, the test developer achieves maximum control over the specific attributes each item measures. The AHM also offers a more convenient way of providing cognitive feedback to students. This feedback is achieved by mapping observed examinee response patterns onto expected examinee response patterns derived from the attribute hierarchy. A student with a certain observed response pattern is expected to have mastered the attributes implied by the corresponding expected response pattern. Similarly, a student may need more work on the attributes not yet mastered. Investigating the Cognitive Attributes... 5 Method Instrument The data used in the present study were student responses to the multiple-choice items on the reading comprehension section of the 2002 administration of a provincial College English Test in China. Two random samples, each containing 1,500 examinees, were taken from the student response data for an initial analysis and then for a cross-validation analysis. Procedure The study was conducted in two stages. In the first stage, the substantive features of the items were analyzed. This stage involved specifying the attribute hierarchies and coding of the cognitive attributes measured by the test items. The second stage of the study was the psychometric analysis of the students’ responses to the test items. This stage involved the generation of expected examinee response patterns, classification of observed response patterns, and evaluation of the attribute hierarchy. Substantive Analysis Specification of attribute hierarchy. The AHM starts with the specification of hierarchies according to cognitive theories in a content domain. In the present study, a number of cognitive attributes which are generally used by students in second language and foreign language (L2) reading comprehension and the relationship among these cognitive attributes were first identified through a review of literature in L2 reading comprehension (e.g., Farhady & Hessamy, 2005; Urquhart & Weir, 1998). Then, according to the relationship among the cognitive attributes, two attribute hierarchies were specified. The specification of two attribute hierarchies allowed us to Investigating the Cognitive Attributes... 6 compare which of the attribute hierarchies reflected more accurately the cognitive attributes used by students to answer the test items. Coding the cognitive attributes. Two expert raters who were experienced in teaching English as a foreign language and who were familiar with the test-taker population of the test in question were recruited. As an item can measure multiple cognitive attributes, the raters were instructed to code all cognitive attributes measured by the items. Items for which there was good agreement between the raters on the attributes were selected for AHM analysis. Psychometric Analysis Generation of expected examinee response patterns. After the attribute hierarchies were specified and the cognitive attributes of the items were coded, expected examinee response patterns were generated by matching the cognitive attributes required by the items with the possible attribute patterns of expected examinees. An expected examinee is a hypothetical examinee who correctly answers items that require only specific cognitive attributes that the examinee has mastered. If the attribute pattern of an expected examinee includes the cognitive attributes required by the item, then this examinee is expected to answer this item correctly. Conversely, if at least one of the cognitive attributes required by the item is missing in an expected examinee’s attribute pattern, then the examinee may not answer this item correctly. Classification of observed response patterns. After the expected response patterns were generated, the observed response patterns were classified. The classification was conducted in three steps. First, the parameters of each item were calibrated with 3-parameter logistic model using BILOG-MG (du Toit, 2003). Second, each observed response pattern was compared Investigating the Cognitive Attributes... 7 against all expected response patterns and slips of both 0 1 and 1 0 were identified. Third, The product of the probabilities of each slip was calculated using the item parameters calibrated in the first step. This gave the likelihood that the observed response pattern was generated from each expected response pattern at the given ability level. The observed response pattern would then be classified as being generated from the expected response pattern with the largest likelihood. Evaluation of the attribute hierarchies. To examine whether the specified attribute hierarchy truthfully reflect the cognitive attributes used by the students, the hierarchy consistency index (HCI) (Cui, Leighton, Gierl, & Hunka, 2006) was used. The values of the index range between –1 and +1. Low values of HCI indicate inconsistency between the observed response patterns and the expected response patterns specified by the attribute hierarchy, suggesting that the attribute hierarchy needs improvement. The mean and standard deviation of the HCI can be used as indicators of the overall model-data fit. A high mean and low standard deviation suggest the observed response patterns fit the AHM model well. In other words, the HCI results help validate the specified attribute hierarchies. Results Substantive Analysis Specifying the Attribute Hierarchies Based on a review of the literature in L2 reading comprehension, eight cognitive attributes involved in L2 reading and in responding to reading items were identified. These cognitive attributes include: Investigating the Cognitive Attributes... 8 (1) basic language knowledge (BA), such as word recognition and basic syntactic knowledge; (2) understanding the content, form, and function of sentences (US); (3) understanding the content, form, and function of larger sections of text (UT); (4) analyzing authors’ purposes, goals, and strategies (PGS); (5) determining word meaning in context (WM); (6) making inferences based on background knowledge (INF); (7) understanding text with difficult vocabulary (VC); and (8) understanding text with complex syntactic structure (SY) (e.g., Alderson, 2000; Cain, Oakhill, & Lemmon, 2004; Perfetti, 1985, 1988; Urquhart & Weir, 1998). Among the eight attributes, BA is assumed to have been mastered by all the examinees and is the prerequisite for all other attributes. US is a sentence-level attribute and is the prerequisite for UT. PGS, WM, and INF all require the understanding of contextual information and therefore require UT. Based on the interrelationship among these six attributes, Hierarchy 1 was specified in Figure 1a. To achieve finer granularity, Hierarchy 2 (Figure 1b) was specified with VC and SY included. These two attributes were included because, when difficult vocabulary or sentences were encountered, the reader would have to rely on morphological clues or contextual cues (e.g., Perfetti, 1985). In other words, understanding difficult vocabulary and sentences represents different cognitive attributes from understanding easy vocabulary and sentences. Coding the Attributes There were 20 items in the reading section of the test. However, four items were not used Investigating the Cognitive Attributes... 9 for coding because they had negative discrimination. Also, as mentioned previously, BA was assumed to have been mastered by all examinees, and thus was not coded by the raters. For the attributes US, UT, PGS, INF, and WM, the raters first had a discussion about the meaning of these attributes. Once they achieved the same understanding of these attributes, they were instructed to code the cognitive attributes measured by the items. For the attributes of VC and SY, the raters were first instructed to identify the parts of text students needed to read to correctly answer the items. Then they were asked to rate the difficulty level of the key vocabulary and the syntactic complexity of these texts. If the key vocabulary was considered difficult for the target test-taker population, then VC was considered to be measured by the item. Similarly, if the syntactic structure of the text was considered complex, then SY was considered to be measured by the item. In the end, the raters achieved agreement on nine of the remaining 16 items, which were then used in the subsequent analyses. The coding results of the nine items were presented in Table 1. Psychometric Analysis Generation of Expected Examinee Response Patterns The expected examinee response patterns, the corresponding ability levels, and expected examinee attribute patterns generated from the two attribute hierarchies are respectively shown in Table 2a and Table 2b. Hierarchy 1, with six cognitive attributes, generated 10 unique examinee attribute patterns and expected examinee response patterns. With two more cognitive attributes, Hierarchy 2 generated 37 unique attribute patterns but only 25 unique expected examinee response patterns. This result occurred because some different attribute patterns may Investigating the Cognitive Attributes... 10 generate the same expected examinee response patterns due to the small number of items (e.g., the attribute patterns of both “11100000” and “11100100” produced the same expected examinee response pattern of “110000000” because none of the nine items measures the attributes of BA, US, UT, and WM.). As an illustration, in Table 2a, Row 3 should be interpreted as follows: An examinee who has attributes of BA, US, and UT is expected to answer items 1, 2, 5, 7, and 8 correctly, producing the expected examinee response pattern (110010110). However, Row 4 in Table 2b indicates that an examinee with only attributes of BA, US, and UT is expected to answer only items 1 and 2 correctly but not items 5, 7, and 8, because items 5, 7, and 8 measure the attributes of VC or SY in addition of BA, US, and UT. In other words, Hierarchy 2 has finer granularity than Hierarchy 1. Classification of Observed Response Patterns To classify the observed response patterns, each observed response pattern is compared against all expected examinee response patterns where slips of the form 0 1 and 1 0 are identified. The product of the probabilities of each slip was calculated to give the likelihood that the observed response pattern was generated from an expected examinee response pattern for a given ability level. An examinee with a certain observed response pattern is likely to have mastered the attributes in the expected examinee response pattern which had the largest likelihood. Summaries of the classification results from Hierarchies 1 and 2 are, respectively, displayed in Tables 3a and 3b. Tables 3a and 3b show that different numbers of examinees have Investigating the Cognitive Attributes... 11 been classified into different ability levels. Moreover, the numbers of examinees classified into each ability level are comparable across the two samples. In addition to classifying the examinees, the tables also indicate the attributes the examinees have and have not mastered. For example, Table 3a indicates that 495 examinees in Sample 1 have been classified into the ability level of 0.529, and 31 examinees have been classified into the ability level of 0.544. Although the ability levels of these two groups of examinees are similar, the attributes that they have mastered are not the same. The examinees at the ability level of 0.529 have mastered the attributes of BA, US, UT and PGS, while the examinees with an ability level of 0.544 have mastered the attributes of BA, US, UT, INF, and WM. In other words, with the AHM, the examinees not only get an overall ability estimate, but they also get the cognitive diagnostic information as to which attributes they have likely mastered and which attributes they still need to acquire or strengthen. Evaluation of the Two Attribute Hierarchies Two attribute hierarchies with different granularity were used in the present study. To evaluate the model-data fit of the two attribute hierarchies, the means and standard deviations of the HCI (Cui et al., 2006) of the two attribute hierarchies were calculated and the results are presented in Table 4. As there are no objective criteria for the HCI values at the current stage, the evaluation of the two hierarchies can only be assessed in comparative terms. As can be seen from Table 4, comparable values of mean and standard deviation of the HCI were produced across the two samples for each hierarchy. Moreover, for both samples, higher means and lower standard deviations of the HCI were produced from Hierarchy 2 than from Hierarchy 1. These results Investigating the Cognitive Attributes... 12 indicate that Hierarchy 2, as a more fine-grained hierarchy than Hierarchy 1, reflected greater accuracy about the cognitive attributes used by the examinees as the model provides a better fit to the observed response data. An Illustration To illustrate how the AHM can be used to make diagnostic inferences, we show the classification results of two observed response patterns, “110010101” (henceforth called ORP1) and “110110100” (henceforth called ORP2) in Table 5a and Table 5b. Table 5a represents the classification results by Hierarchy 1 and Table 5b represents those by Hierarchy 2. As can be seen from Tables 5a and 5b, although five items were correctly answered in both ORP1 and ORP2, they were classified into different ability levels with different attribute patterns. In Table 5a, Hierarchy 1 classified ORP1 into the ability level of 0.450 because the expected examinee response pattern “110010111,” which was associated with this ability level, had the largest likelihood among all expected examinee response patterns. The corresponding attribute pattern of this expected examinee response pattern is “111001,” indicating that students with ORP1 have mastered the attributes of BA, US, UT, and WM. On the other hand, ORP2 was classified into the ability level of –0.130 because the expected examinee response pattern “110010110,” which was associated with this ability level, had the largest likelihood among all expected examinee response patterns. The corresponding attribute pattern of this expected examinee response pattern is “111000,” indicating that students with ORP2 have mastered the attributes of only BA, US, and UT, but not WM, and thus got a lower ability estimate. An inspection of Table 5a and Table 5b also indicates that Hierarchy 2, with its finer Investigating the Cognitive Attributes... 13 granularity, provided diagnostic information on two more attributes, VC and SY, than Hierarchy 1. For example, Hierarchy 1 only indicated that students with ORP1 have mastered the attributes of BA, US, UT, and WM, and that they need more work on the attributes of PGS and INF. However, Hierarchy 1 could not provide any information regarding the attributes of VC and SY because its granularity is not fine enough to include these two attributes. On the other hand, Hierarchy 2 indicated that the same students have mastered the attributes of BA, US, UT, WM, and VC, and they need more work on the attributes of PGS, INF, and SY, thus provided more diagnostic information than Hierarchy 1. These results also indicate the importance of selecting an appropriate cognitive model in a content domain when using the AHM. Discussion The present study was designed to illustrate the use of the AHM by applying this method to a L2 English reading test. The AHM represents an integration of cognitive psychology and psychometrics. It starts with cognitive theories of a content domain and then involves substantive and psychometric analyses. As a result, the present study was conducted in two stages, with substantive analyses followed by psychometric analyses on the test. In the substantive analyses, eight cognitive attributes involved in L2 reading comprehension were identified and two attribute hierarchies with different granularity were specified from a review in the literature of L2 reading comprehension (Figure 1a and Figure 1b). The purpose of specifying two hierarchies is to compare which of the attribute hierarchies reflected more accurately the cognitive attributes used by students in answering the test items. Then, two expert raters were asked to code the cognitive attributes measured by the test items. Nine items for which good agreement was achieved on the Investigating the Cognitive Attributes... 14 cognitive attributes were selected for the psychometric analyses (Table 1). The psychometric analyses include the generation of expected examinee response patterns, classification of observed response patterns, and evaluation of attribute hierarchies. In the present study, the two attribute hierarchies generated different number of expected examinee response pattern (Tables 2a and 2b). Based on the two attribute hierarchies and the expected examinee response patterns, the examinees’ observed response patterns were classified into different ability levels with different attribute patterns (Tables 3a and 3b). With the classification, the AHM provides not only an overall ability estimate, but also important diagnostic information to the examinees. Thus, the examinees would be able to know which attributes have been mastered and which attributes require additional work. Finally, the performance of the two attribute hierarchies was evaluated using the HCI. The results of HCI analysis indicated that Hierarchy 2 more accurately reflected the cognitive model underlying the examinee data on the test, as indicated by the higher values of mean and lower values of standard deviation. Moreover, with its finer granularity, Hierarchy 2 provided richer diagnostic information to the examinees. The present study is the first that applies the AHM using empirical testing data. As a cognitively-based psychometric approach, the AHM has obvious advantages over traditional psychometric scoring methods for three reasons. First, in addition to providing an overall ability estimate, the AHM provided more detailed cognitive diagnostic information to the examinees, with which the examinees could make focused efforts on attributes that they have not yet mastered. In other words, the AHM can be instrumental in improving student learning. Second, the AHM also provides useful information for construct validation. As an indispensable Investigating the Cognitive Attributes... 15 component of the AHM, the substantive analysis yields information as to what the items are intended to measure. Then, if HCI results indicate good fit between the attribute hierarchies and the observed response data, then positive evidence about the construct validity of the test can be obtained. Third, the AHM has the mechanism of feeding back to the cognitive theories of a content domain. For example, in the current study, Hierarchy 2 was found to reflect more accurately the cognitive attributes students use on the test. Such results indicate that VC and SY are two important cognitive attributes that should not be ignored in a cognitive model of L2 reading comprehension. Investigating the Cognitive Attributes... 16

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constructing and Validating a Q-Matrix for Cognitive Diagnostic Analysis of a Reading Comprehension Test Battery

Of paramount importance in the study of cognitive diagnostic assessment (CDA) is the absence of tests developed for small-scale diagnostic purposes. Currently, much of the research carried out has been mainly on large-scale tests, e.g., TOEFL, MELAB, IELTS, etc. Even so, formative language assessment with a focus on informing instruction and engaging in identification of student’s strengths and...

متن کامل

The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment

The objective of the present paper is to introduce a person-fit statistic called hierarchy consistency index (HCI) to help detect misfitting item-response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses expected under a given cognitive m...

متن کامل

Selecting the Best Fit Model in Cognitive Diagnostic Assessment: Differential Item Functioning Detection in the Reading Comprehension of the PhD Nationwide Admission Test

This study was an attemptto provide detailed information of the strengths and weaknesses of test takers‟ real ability through cognitive diagnostic assessment, and to detect differential item functioning in each test item. The rationale for using CDA was that it estimates an item‟s discrimination power, whereas clas- sical test theory or item response theory depicts between rather within item mu...

متن کامل

Investigating the Relatedness of Cloze-Elide Test, Multiple-Choice Cloze Test, and C-test as Measures of Reading Comprehension

Reading comprehension ability consists of multiple cognitive processes, and cloze tests have long been claimed to measure this ability as a whole. However, since the introduction of cloze test, different varieties of it have been proposed by the testers. Thus, the present study was an attempt to examine the relatedness of Cloze-Elide test, Multiple-choice (MC) cloze test, and C-test as three di...

متن کامل

Testing Cognitive Models 1 Running head: TESTING EXPERT‐BASED AND STUDENT‐BASED COGNITIVE MODELS Testing Expert-Based and Student-Based Cognitive Models: An Application of the Attribute Hierarchy Method and Hierarchy Consistency Index

The objective of the present investigation was to compare the adequacy of two cognitive models for predicting examinee performance on a sample of algebra I and II items from the March 2005 administration of the SATTM. The two models included one generated from verbal reports provided by 21 examinees as they solved the SATTM items, and the other generated from the judgment of a content expert. U...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006